590 research outputs found
On Local Regret
Online learning aims to perform nearly as well as the best hypothesis in
hindsight. For some hypothesis classes, though, even finding the best
hypothesis offline is challenging. In such offline cases, local search
techniques are often employed and only local optimality guaranteed. For online
decision-making with such hypothesis classes, we introduce local regret, a
generalization of regret that aims to perform nearly as well as only nearby
hypotheses. We then present a general algorithm to minimize local regret with
arbitrary locality graphs. We also show how the graph structure can be
exploited to drastically speed learning. These algorithms are then demonstrated
on a diverse set of online problems: online disjunct learning, online Max-SAT,
and online decision tree learning.Comment: This is the longer version of the same-titled paper appearing in the
Proceedings of the Twenty-Ninth International Conference on Machine Learning
(ICML), 201
Solving Imperfect Information Games Using Decomposition
Decomposition, i.e. independently analyzing possible subgames, has proven to
be an essential principle for effective decision-making in perfect information
games. However, in imperfect information games, decomposition has proven to be
problematic. To date, all proposed techniques for decomposition in imperfect
information games have abandoned theoretical guarantees. This work presents the
first technique for decomposing an imperfect information game into subgames
that can be solved independently, while retaining optimality guarantees on the
full-game solution. We can use this technique to construct theoretically
justified algorithms that make better use of information available at run-time,
overcome memory or disk limitations at run-time, or make a time/space trade-off
to overcome memory or disk limitations while solving a game. In particular, we
present an algorithm for subgame solving which guarantees performance in the
whole game, in contrast to existing methods which may have unbounded error. In
addition, we present an offline game solving algorithm, CFR-D, which can
produce a Nash equilibrium for a game that is larger than available storage.Comment: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations
and theor
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
Count-Based Exploration with the Successor Representation
In this paper we introduce a simple approach for exploration in reinforcement
learning (RL) that allows us to develop theoretically justified algorithms in
the tabular case but that is also extendable to settings where function
approximation is required. Our approach is based on the successor
representation (SR), which was originally introduced as a representation
defining state generalization by the similarity of successor states. Here we
show that the norm of the SR, while it is being learned, can be used as a
reward bonus to incentivize exploration. In order to better understand this
transient behavior of the norm of the SR we introduce the substochastic
successor representation (SSR) and we show that it implicitly counts the number
of times each state (or feature) has been observed. We use this result to
introduce an algorithm that performs as well as some theoretically
sample-efficient approaches. Finally, we extend these ideas to a deep RL
algorithm and show that it achieves state-of-the-art performance in Atari 2600
games when in a low sample-complexity regime.Comment: This paper appears in the Proceedings of the 34th AAAI Conference on
Artificial Intelligence (AAAI 2020
The Case Against Employment Tester Standing Under Title VII and 42 U.S.C. § 1981
In 1964, Congress passed comprehensive legislation aimed at eradicating discrimination in employment, public accommodations, public facilities, public schools, and federal benefit programs. Title VII of this Act directed its aim specifically at stamping out prejudice in employment. Four years later, the Supreme Court resurrected the provisions of § 1 of the Civil Rights Act of 1866, which, among other things, protects citizens, regardless of race or color, in their right to make and enforce [employment] contracts. Together, Title VII and § 1981 serve as the primary legal bases for challenging racially discriminatory actioris by private employers. More than thirty years after the passage of Title VII and the Court\u27s resurrection of § 1981, though, society continues to feel the lingering effects of America\u27s history of slavery and segregation in the field of employment. A study by the Urban Institute in the late 1980s and early 1990s determined that black job applicants continued to face discriminatory treatment at all levels of the hiring process. In view of the continuing effects of discrimination in employment, a number of civil rights organizations around the country have employed testing as a means of ferreting out discrimination in the hiring process
- …